Psy 255 Principles of Psychological Measurement

Psych 560 Principles of Psychological Measurement

Professor Neil H. Schwartz, Ph.D.

Guidelines for Reviewing a Test

Introduction

Reviewing a professionally published instrument is an excellent way to make the concepts of measurement come alive. Concepts of reliability, standard error of measurement, norms, validity, etc. begin to take on real meaning when evaluated in the context of a real psychological or educational measuring device. As such, you will have the opportunity to review a test, and write up your review in a 6-10 paper due Monday, May 23rd, 2005 .

General Approach to the Review

When reviewing a test, it is best, as Salvia & Ysseldyke (2002) suggest, to adopt a "show me" attitude. That is, do not expect test authors to speak candidly about the shortcomings of their instrument. Test authors and test publishers are in business. Thus, while some are more conservative and cautious than others, at the end of the day, they are all still in the business of "selling" their product. Thus, it will be up to you to evaluate the adequacy of a test in terms of its strength as a measuring device, and determine the purpose for which, and the constituency on which, the test can be successfully used.

Also, be prepared to search for information. Not all test manuals are neatly organizaed and arranged, rendering pertinent information difficult to find. Thus, consider yourself more of an investigator searching for "truth", rather than just a reporter describing what the manual tells you. Look for inconsistencies of information between text and tables of information in the manual. Be particularly aware of omissions of information. For example, test-retest reliabilities are often reported, but the test-retest interval of time is not. Thus, an instrument may have a high reliability value obtained from a test-retest of 24 to 48 hours.

Finally, when reading the text of the manual, pay particular attention to inconsistencies between what the authors say about the instrument, and what the data support. Often, authors will state information in the text of the manual that is not supported by empirical data. In short, read with a keen eye.

Materials You Will Need to Acquire

The materials you will need are:

The test manual (essential)
Test materials-- that is, stimulus materials that are shown to the examinee.
Examinee record booklets, often called protocols. These are sheets where an examiner actually does his/her scoring of examinee performance.
Supplemental test documents. Sometimes tests-- the PPVT-R, for example-- have what is called a technical supplement. There you will often find pages devoted to data. Many tests do not have this supplement, but if a test does have one, it is essential to obtain it, and evaluate the instrument using the information contained therein.

Components of the Review

In your review, plan on writing a critique of the instrument in terms of the following:

Summarize what the test manual has to say about the test. Is it accurate? Is it overstated? Is it supported by data? What is the writing style of the manual? Is it pedantic, condescending, or difficult to decipher?
Describe the norming population. Consider the representativeness of the norms; the size of the sample; the proportionate distribution of the normative characterisitics and elements.
Discuss the kind of scores provided by the instrument. Are they linearly-transformed, or are the scores non-linear transformations? Is there sufficient cautionary discussion of the use of nonlinear transformations? Does the manual descibe the method used to derive age-equivalent and/or grade-equivalent scores?
Discuss the accuracy of the instrument in terms of the reliability data. What kind of reliability was calculated? What limitations does each type impose? How high were the reliabilities? Who were examines on which the reliability data was taken? How many examinees were in the samples? Is the test equally reliable for each age cohort of the instrument? Is the standard error of measurement of the instrument uniform across age cohorts? What type of reliablity was used to establish the SEM's? Do you think the type of reliability estimates used to establish the SEM's are justified?
Discuss the validity of the instrument. What kind of validity information is available in the test manual? How was it derived? Is there sufficient discriminant validity to establish what the test does and does not measure? How strong are the validity coefficients? Does the manual's discussion of validity match the data it provides? How old are the validity studies? What were the criteria emplyed to establish the criterion-related validities?
Finally, your review should conclude with a short paragraph on the adequacy of the instrument, based on your review, and the extent to which the test can, and should be, used.